Social Text Normalization using Contextual Graph Random Walks
نویسندگان
چکیده
We introduce a social media text normalization system that can be deployed as a preprocessing step for Machine Translation and various NLP applications to handle social media text. The proposed system is based on unsupervised learning of the normalization equivalences from unlabeled text. The proposed approach uses Random Walks on a contextual similarity bipartite graph constructed from n-gram sequences on large unlabeled text corpus. We show that the proposed approach has a very high precision of (92.43) and a reasonable recall of (56.4). When used as a preprocessing step for a state-of-the-art machine translation system, the translation quality on social media text improved by 6%. The proposed approach is domain and language independent and can be deployed as a preprocessing step for any NLP application to handle social media text.
منابع مشابه
Context Tailoring for Text Normalization
Language processing tools suffer from significant performance drops in social media domain due to its continuously evolving language. Transforming non-standard words into their standard forms has been studied as a step towards proper processing of ill-formed texts. This work describes a normalization system that considers contextual and lexical similarities between standard and non-standard wor...
متن کاملA Graph-based Approach for Contextual Text Normalization
The informal nature of social media text renders it very difficult to be automatically processed by natural language processing tools. Text normalization, which corresponds to restoring the non-standard words to their canonical forms, provides a solution to this challenge. We introduce an unsupervised text normalization approach that utilizes not only lexical, but also contextual and grammatica...
متن کاملMovie Recommendation using Random Walks over the Contextual Graph
Recommender systems have become an essential tool in fighting information overload. However, the majority of recommendation algorithms focus only on using ratings information, while disregarding information about the context of the recommendation process. We present ContextWalk, a recommendation algorithm that makes it easy to include different types of contextual information. It models the bro...
متن کاملNCSU_SAS_WOOKHEE: A Deep Contextual Long-Short Term Memory Model for Text Normalization
To address the challenges of normalizing online conversational texts prevalent in social media, we propose a contextual long-short term memory (LSTM) recurrent neural network based approach, augmented with a self-generated dictionary normalization technique. Our approach utilizes a sequence of characters as well as the part-of-speech associated with words without harnessing any external lexical...
متن کاملShort Random Walks for Community Discovery in Social Networks
The study of networks is an active area of research due to its capability of modeling many real world complex systems. One such interesting property to investigate in any typical network is the community structure which is the division of networks into groups. The study of community structure in networks is closely related to the ideas of graph partitioning in graph theory. Finding an exact sol...
متن کامل